Last Update: 7/13/2025
LLMVision Audio Transcription API
The LLMVision Clone Voice API allows users to generate speech that closely resembles a provided voice sample.
Endpoint
POST https://platform.llmprovider.ai/v1/audio/transcriptions
Request Headers
| Header | Value |
|---|---|
| Authorization | Bearer YOUR_API_KEY |
| Content-Type | multipart/form-data |
Request Body
| Parameter | Type | Description |
|---|---|---|
| file | file | The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. file maxsize <= 20M |
| model | string | ID of the model to use (e.g., whisper-1). |
| prompt | string | (Optional) Text to guide the model's style or continue a previous audio segment. |
| response_format | string | (Optional) The format of the transcript output (json, text, srt, verbose_json, or vtt). Default is json. |
| temperature | number | (Optional) The sampling temperature, between 0 and 1. Default is 0. |
| language | string | (Optional) The language of the input audio (e.g., en, es, fr). |
| timestamp_granularities[] | array | (Optional) The timestamp granularities to populate for this transcription. |
Response Body
The transcription object or a verbose transcription object.
The transcription object(JSON)
| Parameter | Type | Description |
|---|---|---|
| text | string | The transcribed text. |
{
"text": "Hello, this is the transcribed text from the audio file."
}
The transcription object (Verbose JSON)
| Parameter | Type | Description |
|---|---|---|
| task | string | The task performed by the model. |
| language | string | The language of the input audio. |
| duration | number | The duration of the audio in seconds. |
| segments | array | Segments of the transcribed text and their corresponding details. |
| text | string | The transcribed text. |
| words | array | Extracted words and their corresponding timestamps. |
{
"task": "transcribe",
"language": "en",
"duration": 2.95,
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.95,
"text": "Hello, this is the transcribed text from the audio file.",
"tokens": [
50364,
2425,
11,
359,
307,
1161,
1123,
422,
264,
1467,
1780
],
"temperature": 0.0,
"avg_logprob": -0.458,
"compression_ratio": 0.688,
"no_speech_prob": 0.0192
}
],
"text": "Hello, this is the transcribed text from the audio file."
}
Example Request
- Shell
- nodejs
- python
curl -X POST https://platform.llmprovider.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@audio.mp3" \
-F model="lmp-stt-20241013"
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const formData = new FormData();
formData.append('file', fs.createReadStream('audio.mp3'));
formData.append('model', 'lmp-stt-20241013');
axios.post('https://platform.llmprovider.ai/v1/audio/transcriptions', formData, {
headers: {
'Authorization': `Bearer ${YOUR_API_KEY}`,
...formData.getHeaders()
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error:', error);
});
import requests
audio_file = open("audio.mp3", "rb")
files = {
"file": audio_file
}
headers = {
"Authorization": f"Bearer {YOUR_API_KEY}"
}
response = requests.post(
"https://platform.llmprovider.ai/v1/audio/transcriptions",
headers=headers,
files=files,
data={
"model": "lmp-stt-20241013"
}
)
print(response.json())
For any questions or further assistance, please contact us at [email protected].